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SPEECH COMMAND INPUT RECOGNITION SYSTEM FOR INTERACTIVE 
COMPUTER DISPLAY WITH SPEECH CONTROLLED DISPLAY OF 

RECOGNIZED COMMANDS 

Cross-Reference to Related Copending Patent Applications 

The following patent applications, which are 
assigned to the assignee of the present invention and 
filed concurrently herewith, cover subject matter related 
to the subject matter of the present invention: "SPEECH 
COMMAND INPUT RECOGNITION SYSTEM FOR INTERACTIVE COMPUTER 
DISPLAY WITH MEANS FOR CONCURRENT AND MODELESS 
DISTINGUISHING BETWEEN SPEECH COMMANDS AND SPEECH QUERIES 
FOR LOCATING COMMANDS", Scott A. Morgan et al. (Attorney 
Docket NO. AT9-98-344); "SPEECH COMMAND INPUT RECOGNITION 
SYSTEM FOR INTERACTIVE COMPUTER DISPLAY WITH TERM 
WEIGHTING MEANS USED IN INTERPRETING POTENTIAL COMMANDS 
FROM RELEVANT SPEECH TERMS", Scott A. Morgan et al. 
(Attorney Docket No. AT9-98-342); "SPEECH COMMAND INPUT 
RECOGNITION SYSTEM FOR INTERACTIVE COMPUTER DISPLAY WITH 
INTERPRETATION OF ANCILLARY RELEVANT SPEECH QUERY TERMS 
INTO COMMANDS", Scott A. Morgan et al. (Attorney Docket 
No. AT9-98-343); and "METHOD AND APPARATUS FOR PRESENTING 
PROXIMAL FEEDBACK IN VOICE COMMAND SYSTEMS", Alan R. 
Tannenbaum (Attorney Docket No. AT9-97-771). 

Technical Field 

The present invention relates to interactive 
computer controlled display systems with speech command 
input and more particularly to such systems which present 
display feedback to the interactive users. 
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Background of Related Art 

The 1990 's decade has been marked by a technological 
revolution driven by the convergence of the data 
processing industry with the consumer electronics 
industry. This advance has been even further accelerated 
by the extensive consumer and business involvement in the 
Internet over the past few years. As a result of these 
changes it seems as if virtually all aspects of human 
endeavor in the industrialized world require 
human/computer interfaces. There is a need to make 
computer directed activities accessible to people who, up 
to a few years ago, were computer illiterate or, at best, 
computer indifferent . 

Thus, there is continuing demand for interfaces to 
computers and networks which improve the ease of use for 
the interactive user to access functions and data from 
the computer. With desktop-like interfaces including 
windows and icons, as well as three-dimensional virtual 
reality simulating interfaces, the computer industry has 
been working hard to fulfill such interface needs by 
making interfaces more user friendly by making the 
human/computer interfaces closer and closer to real world 
interfaces, e.g. human/human interfaces. In such an 
environment it would be expected that speaking to the 
computer in natural language would be a very natural way 
of interfacing with the computer for even novice users. 
Despite the potential advantages of speech recognition 
computer interfaces, this technology has been relatively 
slow in gaining extensive user acceptance. 

Speech recognition technology has been available for 
over twenty years but it has only been recently that it 
is beginning to find commercial acceptance, particularly 
with speech dictation or "speech to text" systems, such 
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as those marketed by International Business Machines 
Corporation (IBM) and Kurzweil Corporation. That aspect 
of the technology is now expected to have accelerated 
development until it will have a substantial niche in the 
word processing market* On the other hand, a more 
universal application of speech recognition input to 
computers, which is still behind expectations in user 
acceptance, is in command and control technology wherein, 
for example, a user may navigate through a computer 
system's graphical user interface (GUI) by the user 
speaking the commands which are customarily found in the 
systems menu text, icons, labels, buttons, etc. 

Many of the deficiencies in speech recognition, both 
in word processing and in command technologies, are due 
to inherent voice recognition errors due in part to the 
status of the technology and in part to the variability 
of user speech patterns and the user's ability to 
remember the specific commands necessary to initiate 
actions. As a result, most current voice recognition 
systems provide some form of visual feedback which 
permits the user to confirm that the computer understands 
his speech utterances. In word processing, such visual 
feedback is inherent in this process since the purpose of 
the process is to translate from the spoken to the 
visual. That may be one of the reasons that the word 
processing applications of speech recognition have 
progressed at a faster pace. In any event, in all voice 
recognition systems with visual feedback, at some stage, 
the interactive user is required to make some manual 
input, e.g. through a mouse or a keyboard. The need for 
such manual operations still gets in the way of 
interactive users who, because of a lack of computer 
skills or other reasons, wish to relate to the computer 
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system in a fully voice activated or conversational 
manner . 

Summary of the Present Invention 

The present invention provides a solution for users 
5 of voice recognition systems who still need visual 
feedback in order to confirm the accuracy of spoken 
commands but need to operate in a "hands-off" mode with 
respect to computer input. In an interactive computer 
controlled display system with speech command input 
10 recognition, the present invention provides a system for 
confirming the recognition of a command by first 



m predetermining a plurality of speech commands for 



respectively designating each of a corresponding 
plurality of system actions and providing means for 



4* 15 detecting such speech commands. There also are means 

01 

responsive to a detected speech command for displaying 



y> said command for a predetermined time period, during 

J which time the user may give a spoken command to stop the 

vj system action designated by said displayed command. In 

l i3 20 the event that said system action is not stopped during 

03 . ... 

said predetermined time period, the system action 

designated by said displayed command will be executed. 

The user need not wait for the expiration of the time 

period if he notes that the displayed command is the 

25 right one, he has speech command means for executing the 

system action designated by said displayed command prior 

to the expiration of said time period. This may be as 

simple as just repeating the displayed command. 



30 



Brief Description of the Drawings 

The present invention will be better understood and 
its numerous objects and advantages will become more 
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apparent to those skilled in the art by reference to the 
following drawings, in conjunction with the accompanying 
specification, in which: 

Fig. 1 is a block diagram of a generalized data 
5 processing system including a central processing unit 
which provides the computer controlled interactive 
display system with voice input used in practicing the 
present invention; 

Fig. 2 is a diagrammatic view of a display screen on 
10 which an interactive dialog panel interface used for 
visual feedback when a speech command input has been 
-£ made ; 

fjj Fig. 3 is the display screen view of Fig. 2 after a 

speech command input has been made and part of the time 
£n 15 period for retracting the command has expired; 

Fig. 4 is the display screen view of Fig. 3 after a 
further part of the time period for retracting the 
M* command has expired; 

Fig. 5 is the display screen view of Fig. 4 after 
20 the time period for retracting the command has almost 
expired; 

Fig. 6 is a flowchart of the basic elements of the 
system and program in a computer controlled display 
with visual feedback system of the present invention for 
25 enabling the spoken retraction of spoken commands; and 
Fig. 7 is a flowchart of the steps involved in 
running the program set up in Fig. 6. 

Detailed Description of the Preferred Embodiment 

Referring to Fig. 1, a typical data processing 
30 system is shown which may function as the computer 
controlled display terminal used in implementing the 
system of the present invention by receiving and 
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interpreting speech input and providing a displayed 
feedback of spoken commands and a time period wherein a 
user may orally stop a command. 

A central processing unit (CPU) 10, such as any PC 
microprocessor in a PC available from IBM or Dell Corp., 
is provided and interconnected to various other 
components by system bus 12. An operating system 41 runs 
on CPU 10, provides control and is used to coordinate the 
function of the various components of Fig. 1. Operating 
system 41 may be one of the commercially available 
operating systems, such as the OS/2 (TM) operating system 
available from IBM (OS/2 is a trademark of International 
Business Machines Corporation); Microsoft's Windows 95 (TM) 
or Windows NT (TH) , as well as the UNIX or AIX operating 
systems. A speech recognition program with visual 
feedback of spoken commands, so that the user may speak 
retractions during set time periods, application 40, to 
be subsequently described in detail, runs in conjunction 
with operating system 41 and provides output calls to the 
operating system 41, which implements the various 
functions to be performed by the application 40. 

A read only memory (ROM) 16 is connected to CPU 10 
via bus 12 and includes the basic input/output system 
(BIOS) that controls the basic computer functions. 
Random access memory (RAM) 14, I/O adapter 18 and 
communications adapter 34 are also interconnected to 
system bus 12. It should be noted that software 
components, including operating system 41 and application 
40, are loaded into RAM 14, which is the computer 
system's main memory. I/O adapter 18 may be a small 
computer system interface (SCSI) adapter that 
communicates with the disk storage device 20, i.e. a hard 
drive. Communications adapter 34 interconnects bus 12 
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with an outside network enabling the data processing 
system to communicate with other such systems over a 
local area network (LAN) or wide area network (WAN), 
which includes, of course, the Internet. I/O devices are 
5 also connected to system bus 12 via user interface 

adapter 22 and display adapter 36. Keyboard 24 and mouse 

26 are all interconnected to bus 12 through user 
interface adapter 22. Manual I/O devices, such as the 
keyboard and the mouse, are shown primarily because they 

10 may be used for ancillary I/O functions not related to 
the present invention, which uses primarily spoken 
5 commands. Audio output is provided by speaker 28 and the 

m speech input which is made through input device 27, which 

^ is diagrammatically depicted as a microphone, which 

15 accesses the system through an appropriate interface 
adapter 22. The speech input and recognition will be 
subsequently described in greater detail, particularly 
with respect to Fig. 2. Display adapter 36 includes a 
frame buffer 39, which is a storage device that holds a 
20 representation of each pixel on the display screen 38. 

Images, such as speech input commands, relevant proposed 
commands, as well as speech input terminology display 
feedback panels, may be stored in frame buffer 39 for 
display on monitor 38 through various components, such as 
25 a digital to analog converter (not shown) and the like. 
By using the aforementioned I/O devices, a user is 
capable of inputting visual information to the system 
through the keyboard 24 or mouse 26 in addition to speech 
input through microphone 27 and receiving output 
30 information from the system via display 38 or speaker 28. 

Voice or speech input is applied through microphone 

27 which represents a speech input device. Since the art 
of speech terminology and speech command recognition is 
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an old and well developed one, we will not go into the 
hardware and system details of a typical system which may 
be used to implement the present invention. It should be 
clear to those skilled in the art that the systems and 
hardware in any of the following patents may be used: 
035,671,328; US5 , 133 , 111 ; US5,222,146; US5, 664,061; 
US5,553,121; and US5,157,384. The speech input to the 
system could be actual commands, which the system will 
recognize, and/or speech terminology, which the user 
addresses to the computer so that the computer may 
propose appropriate relevant commands through feedback. 
The input speech goes through a recognition process which 
seeks a comparison to a stored set of commands. If a 
command is identified, the actual command will be 
displayed first and subsequently carried out after a set 
time period during which the command may be vocally 
retracted . 

Now with respect to Figs. 2 through 5 we will 
provide an illustrative example of how the present 
invention may be used to provide the visual feedback of 
displayed commands, as well as the prompts for retracting 
commands. When the screen image panels are described, it 
will be understood that these may be rendered by storing 
image and text creation programs, such as those in any 
conventional window operating system in the RAM 14 of the 
system of Fig. 1. The display screens of Figs. 2 through 
7 are presented to the viewer on display monitor 38 of 
Fig. 1. In accordance with conventional techniques, the 
user may control the screen interactively through a 
conventional I/O device, such as mouse 26, Fig. 1, and 
speech input is applied through microphone 27. These 
operate through user interface 22 to call upon programs 
in RAM 14 cooperating with the operating system 41 to 
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create the images in frame buffer 39 of display adapter 
36 to control the display panels on monitor 38. 

The initial display screen of Fig. 2 shows a display 
screen with visual feedback display panel 70. In the 
5 panel, window 71 will show the words that the user 
speaks; these words may contain commands or may bring 
forth associated relevant commands; window 72 will 
display all of the commands. At this point, window 72 is 
still at a prompt stage suggesting to the user commands 
10 which he may wish to use. The user may issue one of 
these commands by speaking it or he may speak another 
command. Either way, the result will be the display of 



fU the panel of Fig. 3 with the issued command, which, in 

j"" the present example, is one of the suggested commands, 

jjj 15 "Paste" being highlighted 75. This commences the run of 



the time period, as signified by timer icon 76, during 
which the execution of the selected command is delayed, 
and the user may stop or retract the command by saying 
f y "NO" or a like command for stopping the execution. 

%j 20 Figs. 4 and 5 are the panel of Fig. 3 showing the advance 

of the time period toward expiration as indicated by 
timer icon becoming filled up. It should be noted that 
if the user becomes certain that he has the right command 
before the time period has expired, he need not wait for 
25 the command execution; he may vocally confirm his 

selection by issuing an appropriate command and have the 
selected command executed immediately. The system may 
conveniently be set up so that all he has to do is repeat 
his selected command, e.g. "Paste", and the command will 
30 be executed immediately. 

Now with reference to Figs. 6 and 7 we will describe 
a process implemented by the present invention in 
conjunction with the flowcharts of these figures. Fig. 6 
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is a flowchart showing the development of a process 
according to the present invention for providing visual 
feedback to spoken commands so that the user has a time 
period or time delay in the command execution within 
5 which he may vocally confirm or cancel the command. 
First, step 80, a set of recognizable spoken commands, 
which will drive the system being used, is set up and 
stored. Then there are set up appropriate processes to 
carry out the action called for by each recognized speech 
10 command, step 81. A process for displaying recognized 

speech commands is also set up, step 82. A timer process 
for delaying the execution of the selected displayed 
command for a predetermined period is set up, step 83. A 
command is set up or enabled which may be spoken during 



Li 

m 15 the delay period to stop the execution of the selected 

«P command, step 84. A process is set for stopping the 

CD 

execution of the selected command responsive to the 
M= issuance of a stop command, step 85. Also, a process is 

f y set up whereby the user may issue a command to confirm 

si 20 the selected command during the time delay period, 

^ whereupon the selected command will immediately be 

executed, step 86. 

With this set up, the running of the process will 
now be described with respect to Fig. 7. First, step 90, 
25 a determination is made as to whether there has been a 
command recognized by the system. If No, the process is 
returned to step 90 where such a command is awaited. If 
Yes, then the command is displayed, step 91, so that the 
user now has an opportunity to confirm the command during 
30 a period of time where the execution is delayed. This 

delay is timed by starting a timer, step 92, after which, 
decision step 93, a determination is made as to whether 
the time period of delay is over. If No, a further 
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determination is made as to whether a stop command has 
been issued as yet, decision step 94. If Yes, the 
execution of the selected command is cancelled, step 95, 
and the process is branched via branch "B" to step 98 
where a decision is made as to whether the session is 
over, in a manner to subsequently described. If the 
decision from step 94 is No, then the process proceeds to 
step 96 where a determination is made as to whether the 
user has spoken a confirmation of the selected command. 
If No, then the process returns to step 93 where the end 
of the time delay period is awaited. If the decision 
from step 96 is Yes, i.e. the user has spoken a 
confirmation of the selected command, or if the decision 
from step 93 is Yes, i.e. the time period of delay is 
over, then the process goes to step 97 and the selected 
command is executed. Then, step 98, a determination is 
made as to whether the session is over. If Yes, the 
session is exited. If No, the process is returned to 
step 90 where the next command is awaited. 

One of the preferred implementations of the present 
invention is as an application program 40 made up of 
programming steps or instructions resident in RAM 14, 
Fig. 1, during computer operations. Until required by 
the computer system, the program instructions may be 
stored in another readable medium, e.g. in disk drive 20, 
or in a removable memory such as an optical disk for use 
in a CD ROM computer input, or in a floppy disk for use 
in a floppy disk drive computer input. Further, the 
program instructions may be stored in the memory of 
another computer prior to use in the system of the 
present invention and transmitted over a LAN or a WAN, 
such as the Internet, when required by the user of the 
present invention. One skilled in the art should 
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appreciate that the processes controlling the present 
invention are capable of being distributed in the form of 
computer readable media of a variety of forms. 

Although certain preferred embodiments have been 
shown and described, it will be understood that many 
changes and modifications may be made therein without 
departing from the scope and intent of the appended 
claims. 



